-
Notifications
You must be signed in to change notification settings - Fork 1.1k
[IOTDB-17179] CLI: support automatic reconnection when connection is lost #17181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
- Detect connection-related SQLExceptions (refused, timeout, closed, reset, etc.) - In AbstractCli: rethrow connection-related SQLException from executeQuery, setTimeZone, showTimeZone so CLI can handle them - In Cli: on connection loss, close current connection, retry reconnect up to 3 times with 1s interval, then retry the failed command; print 'Connection lost. Reconnected. Retrying command.' on success; exit with clear message after all retries fail - Add isConnectionRelated() and matchesConnectionFailure() in AbstractCli for shared detection; openConnection(), setupConnection(), closeConnectionQuietly() and ReadLineResult in Cli for reconnect flow - Update AbstractCliTest to declare throws SQLException for handleInputCmd calls Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, this is your first pull request in IoTDB project. Thanks for your contribution! IoTDB will be better because of you.
When reconnect succeeds but the retried command fails with a session/statement error (e.g. StatementId doesn't exist in this session), show a friendly message instead of the raw exception. Apply the same handling in AbstractCli.executeQuery so the message is shown both during reconnect-retry and when the user runs the next command. Add isSessionOrStatement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Adds automatic reconnection + retry behavior to the IoTDB CLI interactive session when connection-related failures occur, aligning CLI behavior with other IoTDB clients.
Changes:
- Introduces shared connection-failure/session-state error detection in
AbstractCliand propagates connection-relatedSQLExceptions to the main loop. - Refactors
Cli.receiveCommands()/readerReadLine()to attempt reconnect (bounded retries + backoff) and retry the failed input line. - Updates
AbstractCliTestfor the newthrows SQLExceptionmethod signatures.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 9 comments.
| File | Description |
|---|---|
iotdb-client/cli/src/main/java/org/apache/iotdb/cli/Cli.java |
Implements reconnect/retry loop, connection open/setup/close helpers, and new read-loop result signaling. |
iotdb-client/cli/src/main/java/org/apache/iotdb/cli/AbstractCli.java |
Adds connection/session error classifiers and changes command execution to rethrow connection-related SQLExceptions. |
iotdb-client/cli/src/test/java/org/apache/iotdb/cli/AbstractCliTest.java |
Adjusts unit test compilation due to updated method signatures. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| try { | ||
| connection = openConnection(); | ||
| setupConnection(connection); | ||
| ctx.getPrinter().println("Connection lost. Reconnected. Retrying command."); | ||
| processCommand(ctx, result.failedCommand, connection); | ||
| reconnected = true; | ||
| break; |
Copilot
AI
Feb 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reconnect retry try { connection = openConnection(); setupConnection(connection); ... } can throw TException from setupConnection(), but the retry loop only catches SQLException. This means some connection-loss cases can bypass the reconnect logic and bubble out immediately. Suggestion: include TException (or a common supertype) in the reconnect-attempt catch and handle it as a reconnect failure (with the same retry/backoff), while still distinguishing non-connection SQL errors from command execution.
| if (isSessionOrStatementError(e)) { | ||
| ctx.getPrinter() | ||
| .println( | ||
| "Reconnected, but the previous command could not be completed. Please run your command again."); |
Copilot
AI
Feb 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
executeQuery prints "Reconnected, but the previous command could not be completed..." for session/statement-state errors, but this method cannot know whether a reconnect actually happened. If the same server-side error occurs without any reconnect, this message is misleading. Suggestion: make the message not assume reconnection (e.g., "Session state was reset..."), or only print this in the reconnect-and-retry path where reconnection is known to have occurred.
| "Reconnected, but the previous command could not be completed. Please run your command again."); | |
| "Session state was reset and the previous command could not be completed. Please run your command again."); |
| } catch (SQLException e) { | ||
| if (isConnectionRelated(e)) { | ||
| return ReadLineResult.reconnectAndRetry(s); | ||
| } |
Copilot
AI
Feb 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reconnection trigger/loop is a significant behavior change but there are no unit tests exercising it (e.g., simulating a connection-related SQLException from processCommand, verifying retry count/backoff, and ensuring non-connection SQLExceptions do not trigger reconnect or an incorrect "Could not reconnect" exit). Consider adding focused tests around this new flow to prevent regressions.
| ctx.getPrinter().println("Connection lost. Reconnected. Retrying command."); | ||
| processCommand(ctx, result.failedCommand, connection); | ||
| reconnected = true; | ||
| break; |
Copilot
AI
Feb 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the reconnect-and-retry path, the return value of processCommand(...) is ignored. If the retried input contains an exit/quit statement (or otherwise returns false), the CLI will continue the main loop instead of stopping as it normally would. Suggestion: capture the boolean return from processCommand and convert it into a result.stop / break out of the outer loop when it indicates the user requested exit.
| } catch (SQLException e) { | ||
| if (isSessionOrStatementError(e)) { | ||
| // Reconnect succeeded but retry failed due to session/statement state; ask user to | ||
| // run the command again. | ||
| ctx.getPrinter() | ||
| .println( | ||
| "Reconnected, but the previous command could not be completed. Please run your command again."); | ||
| reconnected = true; | ||
| break; | ||
| } | ||
| if (attempt == RECONNECT_RETRY_NUM) { | ||
| ctx.getErr() | ||
| .printf( | ||
| "%s: Could not reconnect after %d attempts. Please check that the server is running and try again.%n", | ||
| IOTDB_ERROR_PREFIX, RECONNECT_RETRY_NUM); | ||
| ctx.exit(CODE_ERROR); | ||
| } |
Copilot
AI
Feb 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the reconnect loop, any SQLException thrown while retrying the failed command is treated as a reconnect failure. This can incorrectly trigger further reconnect attempts (and eventually exit with "Could not reconnect") for non-connection SQL errors (e.g., syntax/permission errors) that happen after the reconnect succeeds. Suggestion: only retry when isConnectionRelated(e) is true; otherwise report the SQL error from the retried command and continue the main loop (or stop) without attempting further reconnects.
| try { | ||
| connection = openConnection(); | ||
| setupConnection(connection); | ||
| ctx.getPrinter().println("Connection lost. Reconnected. Retrying command."); | ||
| processCommand(ctx, result.failedCommand, connection); | ||
| reconnected = true; | ||
| break; | ||
| } catch (SQLException e) { | ||
| if (isSessionOrStatementError(e)) { | ||
| // Reconnect succeeded but retry failed due to session/statement state; ask user to | ||
| // run the command again. | ||
| ctx.getPrinter() | ||
| .println( | ||
| "Reconnected, but the previous command could not be completed. Please run your command again."); | ||
| reconnected = true; | ||
| break; | ||
| } | ||
| if (attempt == RECONNECT_RETRY_NUM) { | ||
| ctx.getErr() | ||
| .printf( | ||
| "%s: Could not reconnect after %d attempts. Please check that the server is running and try again.%n", | ||
| IOTDB_ERROR_PREFIX, RECONNECT_RETRY_NUM); | ||
| ctx.exit(CODE_ERROR); | ||
| } | ||
| } |
Copilot
AI
Feb 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On reconnect attempts, if openConnection() succeeds but setupConnection() or processCommand() throws, the newly created connection is not closed before the next attempt (the variable is overwritten on the next iteration). This can leak sockets/sessions across retries. Suggestion: in the catch path, call closeConnectionQuietly(connection) and set connection = null before continuing to the next attempt.
| private static void setupConnection(IoTDBConnection connection) | ||
| throws java.sql.SQLException, org.apache.thrift.TException { | ||
| connection.setQueryTimeout(queryTimeout); | ||
| properties = connection.getServerProperties(); |
Copilot
AI
Feb 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
setupConnection() appends to AGGREGRATE_TIME_LIST every time a reconnect happens, but the list is static and never cleared/deduped. During repeated reconnects this will grow unbounded and may introduce duplicate entries. Suggestion: clear the list before adding, or change it to a Set (if ordering is not required).
| properties = connection.getServerProperties(); | |
| properties = connection.getServerProperties(); | |
| AGGREGRATE_TIME_LIST.clear(); |
| s = ctx.getLineReader().readLine(cliPrefix + "> ", null); | ||
| boolean continues = processCommand(ctx, s, connection); | ||
| if (!continues) { | ||
| return true; | ||
| try { | ||
| boolean continues = processCommand(ctx, s, connection); | ||
| if (!continues) { | ||
| return ReadLineResult.stopLoop(); | ||
| } | ||
| } catch (SQLException e) { | ||
| if (isConnectionRelated(e)) { | ||
| return ReadLineResult.reconnectAndRetry(s); | ||
| } |
Copilot
AI
Feb 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When a connection-related SQLException occurs, the CLI retries the entire raw input line s. If the line contains multiple statements separated by ;, some statements may already have executed successfully before the failure, and retrying the whole line can re-run those statements (duplicate writes / side effects). Suggestion: either only retry the specific statement that failed (track progress inside processCommand), or disable auto-retry for multi-statement input and ask the user to rerun manually.
| closeConnectionQuietly(connection); | ||
| connection = null; | ||
| boolean reconnected = false; | ||
| for (int attempt = 1; attempt <= RECONNECT_RETRY_NUM; attempt++) { |
Copilot
AI
Feb 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test is always true, because of this condition.
Test is always true, because of this condition.
Description
Add automatic reconnection to the IoTDB CLI when the connection to the server is lost during an interactive session (e.g. server restart, network blip, or idle timeout). The CLI no longer exits immediately on connection-related errors; it attempts to reconnect with the same parameters and retries the failed command, aligning behavior with the Session API, JDBC, and C++/Python clients.
Content1 — Detection and reconnection flow
SQLException. We treat an exception as connection-related if its message (or cause message, lowercased) contains any of:connection,refused,timeout,closed,reset,network,broken pipe. This logic lives inAbstractCli.isConnectionRelated(SQLException)andmatchesConnectionFailure(String)so it can be shared and reused.DriverManager.getConnectionand the existinginfoproperties. Helper methodsopenConnection(),setupConnection(), andcloseConnectionQuietly()inCliencapsulate open/setup/close so the main loop stays clear.RECONNECT_RETRY_NUMandRECONNECT_RETRY_INTERVAL_MSinClicontrol this; they are not yet user-configurable.Connection lost. Reconnected. Retrying command.If all reconnection attempts fail we print:IoTDB: Could not reconnect after 3 attempts. Please check that the server is running and try again.and exit with error code.Content2 — Class and method organization
isConnectionRelated(SQLException)(package-private static) andmatchesConnectionFailure(String)(private static) for shared detection. InexecuteQuery,setTimeZone, andshowTimeZone, we catchSQLException(orExceptionwhere the API does not throwSQLException) and rethrow whenisConnectionRelated(e); otherwise we keep the existing "print error and return error code" behavior.handleInputCmdandprocessCommandnow declarethrows SQLExceptionso connection failures propagate to the CLI loop instead of being swallowed.ReadLineResult(inner class withstop,failedCommand) and factory methodscontinueLoop(),stopLoop(),reconnectAndRetry(String)so the read-eval loop can signal "continue", "exit", or "reconnect and retry this command".receiveCommands()no longer uses try-with-resources for the connection; it holds the connection in a variable, and whenreaderReadLine()returns a result withfailedCommand != null, it runs the reconnect loop (close → retry open/setup → print message → retry command).readerReadLine()wrapsprocessCommand()in a try-catch; on connection-relatedSQLExceptionit returnsreconnectAndRetry(s)with the current line; on otherSQLExceptionit prints and returnsstopLoop().testHandleInputInputCmd()now declaresthrows SQLExceptionand importsjava.sql.SQLExceptionso it compiles with the updatedhandleInputCmdsignature.Content3 — Corner cases and alternatives
reconnectAndRetryand run the same reconnect/retry flow again (each time with up to 3 reconnect attempts). Non-connectionSQLExceptions still print the error and stop the loop (exit) as before. Interrupt and EOF handling inreaderReadLine()are unchanged.isSessionOrStatementError()in AbstractCli and show: "Reconnected, but the previous command could not be completed. Please run your command again." so the user is not shown the raw exception. This handling is applied both in the reconnect-retry path in Cli and in AbstractCli.executeQuery for the normal command path.This PR has:
Key changed/added classes (or packages if there are too many classes) in this PR
org.apache.iotdb.cli.AbstractCli—isConnectionRelated,matchesConnectionFailure,isSessionOrStatementError,matchesSessionOrStatementFailure; rethrow connection-relatedSQLExceptioninexecuteQuery,setTimeZone,showTimeZone; inexecuteQueryshow friendly message for session/statement errors;handleInputCmd,processCommandnowthrows SQLExceptionorg.apache.iotdb.cli.Cli—ReadLineResult,openConnection(),setupConnection(),closeConnectionQuietly(); refactoredreceiveCommands()andreaderReadLine()for reconnect-and-retry flow; handle session/statement error in reconnect-retry pathorg.apache.iotdb.cli.AbstractCliTest—testHandleInputInputCmd()updated forthrows SQLExceptionCloses #17179